309 research outputs found
A Conversation with Alan Gelfand
Alan E. Gelfand was born April 17, 1945, in the Bronx, New York. He attended
public grade schools and did his undergraduate work at what was then called
City College of New York (CCNY, now CUNY), excelling at mathematics. He then
surprised and saddened his mother by going all the way across the country to
Stanford to graduate school, where he completed his dissertation in 1969 under
the direction of Professor Herbert Solomon, making him an academic grandson of
Herman Rubin and Harold Hotelling. Alan then accepted a faculty position at the
University of Connecticut (UConn) where he was promoted to tenured associate
professor in 1975 and to full professor in 1980. A few years later he became
interested in decision theory, then empirical Bayes, which eventually led to
the publication of Gelfand and Smith [J. Amer. Statist. Assoc. 85 (1990)
398-409], the paper that introduced the Gibbs sampler to most statisticians and
revolutionized Bayesian computing. In the mid-1990s, Alan's interests turned
strongly to spatial statistics, leading to fundamental contributions in
spatially-varying coefficient models, coregionalization, and spatial boundary
analysis (wombling). He spent 33 years on the faculty at UConn, retiring in
2002 to become the James B. Duke Professor of Statistics and Decision Sciences
at Duke University, serving as chair from 2007-2012. At Duke, he has continued
his work in spatial methodology while increasing his impact in the
environmental sciences. To date, he has published over 260 papers and 6 books;
he has also supervised 36 Ph.D. dissertations and 10 postdocs. This interview
was done just prior to a conference of his family, academic descendants, and
colleagues to celebrate his 70th birthday and his contributions to statistics
which took place on April 19-22, 2015 at Duke University.Comment: Published at http://dx.doi.org/10.1214/15-STS521 in the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
Centered Partition Process: Informative Priors for Clustering
There is a very rich literature proposing Bayesian approaches for clustering
starting with a prior probability distribution on partitions. Most approaches
assume exchangeability, leading to simple representations in terms of
Exchangeable Partition Probability Functions (EPPF). Gibbs-type priors
encompass a broad class of such cases, including Dirichlet and Pitman-Yor
processes. Even though there have been some proposals to relax the
exchangeability assumption, allowing covariate-dependence and partial
exchangeability, limited consideration has been given on how to include
concrete prior knowledge on the partition. For example, we are motivated by an
epidemiological application, in which we wish to cluster birth defects into
groups and we have prior knowledge of an initial clustering provided by
experts. As a general approach for including such prior knowledge, we propose a
Centered Partition (CP) process that modifies the EPPF to favor partitions
close to an initial one. Some properties of the CP prior are described, a
general algorithm for posterior computation is developed, and we illustrate the
methodology through simulation examples and an application to the motivating
epidemiology study of birth defects
Spatial predictions on physically constrained domains: Applications to Arctic sea salinity data
In this paper, we predict sea surface salinity (SSS) in the Arctic Ocean
based on satellite measurements. SSS is a crucial indicator for ongoing changes
in the Arctic Ocean and can offer important insights about climate change. We
particularly focus on areas of water mistakenly flagged as ice by satellite
algorithms. To remove bias in the retrieval of salinity near sea ice, the
algorithms use conservative ice masks, which result in considerable loss of
data. We aim to produce realistic SSS values for such regions to obtain more
complete understanding about the SSS surface over the Arctic Ocean and benefit
future applications that may require SSS measurements near edges of sea ice or
coasts. We propose a class of scalable nonstationary processes that can handle
large data from satellite products and complex geometries of the Arctic Ocean.
Barrier Overlap-Removal Acyclic directed graph GP (BORA-GP) constructs sparse
directed acyclic graphs (DAGs) with neighbors conforming to barriers and
boundaries, enabling characterization of dependence in constrained domains. The
BORA-GP models produce more sensible SSS values in regions without satellite
measurements and show improved performance in various constrained domains in
simulation studies compared to state-of-the-art alternatives. The R package is
available on https://github.com/jinbora0720/boraGP
Nonparametric Bayes Shrinkage for Assessing Exposures to Mixtures Subject to Limits of Detection
Assessing potential associations between exposures to complex mixtures and health outcomes may be complicated by a lack of knowledge of causal components of the mixture, highly correlated mixture components, potential synergistic effects of mixture components, and difficulties in measurement. We extend recently proposed nonparametric Bayes shrinkage priors for model selection to investigations of complex mixtures by developing a formal hierarchical modeling framework to allow different degrees of shrinkage for main effects and interactions and to handle truncation of exposures at a limit of detection. The methods are used to shed light on data from a study of endometriosis and exposure to environmental polychlorinated biphenyl congeners
mpower: An R Package for Power Analysis via Simulation for Correlated Data
Estimating sample size and statistical power is an essential part of a good
study design. This R package allows users to conduct power analysis based on
Monte Carlo simulations in settings in which consideration of the correlations
between predictors is important. It runs power analyses given a data generative
model and an inference model. It can set up a data generative model that
preserves dependence structures among variables given existing data
(continuous, binary, or ordinal) or high-level descriptions of the
associations. Users can generate power curves to assess the trade-offs between
sample size, effect size, and power of a design. This paper presents tutorials
and examples focusing on applications for environmental mixture studies when
predictors tend to be moderately to highly correlated. It easily interfaces
with several existing and newly developed analysis strategies for assessing
associations between exposures and health outcomes. However, the package is
sufficiently general to facilitate power simulations in a wide variety of
settings
Accelerometry-Assessed Latent Class Patterns of Physical Activity and Sedentary Behavior With Mortality
Latent class analysis provides a method for understanding patterns of physical activity and sedentary behavior. This study explored the association of accelerometer-assessed patterns of physical activity/sedentary behavior with all-cause mortality
Bayesian Functional Principal Component Analysis using Relaxed Mutually Orthogonal Processes
Functional Principal Component Analysis (FPCA) is a prominent tool to
characterize variability and reduce dimension of longitudinal and functional
datasets. Bayesian implementations of FPCA are advantageous because of their
ability to propagate uncertainty in subsequent modeling. To ease computation,
many modeling approaches rely on the restrictive assumption that functional
principal components can be represented through a pre-specified basis. Under
this assumption, inference is sensitive to the basis, and misspecification can
lead to erroneous results. Alternatively, we develop a flexible Bayesian FPCA
model using Relaxed Mutually Orthogonal (ReMO) processes. We define ReMO
processes to enforce mutual orthogonality between principal components to
ensure identifiability of model parameters. The joint distribution of ReMO
processes is governed by a penalty parameter that determines the degree to
which the processes are mutually orthogonal and is related to ease of posterior
computation. In comparison to other methods, FPCA using ReMO processes provides
a more flexible, computationally convenient approach that facilitates accurate
propagation of uncertainty. We demonstrate our proposed model using extensive
simulation experiments and in an application to study the effects of
breastfeeding status, illness, and demographic factors on weight dynamics in
early childhood. Code is available on GitHub at
https://github.com/jamesmatuk/ReMO-FPC
Identifiable and interpretable nonparametric factor analysis
Factor models have been widely used to summarize the variability of
high-dimensional data through a set of factors with much lower dimensionality.
Gaussian linear factor models have been particularly popular due to their
interpretability and ease of computation. However, in practice, data often
violate the multivariate Gaussian assumption. To characterize higher-order
dependence and nonlinearity, models that include factors as predictors in
flexible multivariate regression are popular, with GP-LVMs using Gaussian
process (GP) priors for the regression function and VAEs using deep neural
networks. Unfortunately, such approaches lack identifiability and
interpretability and tend to produce brittle and non-reproducible results. To
address these problems by simplifying the nonparametric factor model while
maintaining flexibility, we propose the NIFTY framework, which parsimoniously
transforms uniform latent variables using one-dimensional nonlinear mappings
and then applies a linear generative model. The induced multivariate
distribution falls into a flexible class while maintaining simple computation
and interpretation. We prove that this model is identifiable and empirically
study NIFTY using simulated data, observing good performance in density
estimation and data visualization. We then apply NIFTY to bird song data in an
environmental monitoring application.Comment: 50 pages, 17 figure
Bayesian joint modeling of chemical structure and dose response curves
Today there are approximately 85,000 chemicals regulated under the Toxic
Substances Control Act, with around 2,000 new chemicals introduced each year.
It is impossible to screen all of these chemicals for potential toxic effects
either via full organism in vivo studies or in vitro high-throughput screening
(HTS) programs. Toxicologists face the challenge of choosing which chemicals to
screen, and predicting the toxicity of as-yet-unscreened chemicals. Our goal is
to describe how variation in chemical structure relates to variation in
toxicological response to enable in silico toxicity characterization designed
to meet both of these challenges. With our Bayesian partially Supervised Sparse
and Smooth Factor Analysis (BS3FA) model, we learn a distance between chemicals
targeted to toxicity, rather than one based on molecular structure alone. Our
model also enables the prediction of chemical dose-response profiles based on
chemical structure (that is, without in vivo or in vitro testing) by taking
advantage of a large database of chemicals that have already been tested for
toxicity in HTS programs. We show superior simulation performance in distance
learning and modest to large gains in predictive ability compared to existing
methods. Results from the high-throughput screening data application elucidate
the relationship between chemical structure and a toxicity-relevant
high-throughput assay. An R package for BS3FA is available online at
https://github.com/kelrenmor/bs3fa
- …